FLEX: Unifying Evaluation for Few-Shot NLP

we release the FLEX benchmark, which includes four few-shot transfer settings, zero-shot evaluation, and a public leaderboard that covers diverse NLP tasks

https://github.com/allenai/flex

https://youtu.be/2CeuNW8lIZo?si=pvTaZ7KN3qFGWfIc